///////////////////////////////////
// ``An Informed Forensics...'' //
// replication materials README //
//////////////////////////////////

// Authors:
Jacob M. Montgomery
Santiago Olivella
Joshua D. Potter
Brian F. Crisp


// Dependencies:
- plyr
- ltm
- psy
- foreach 
- doMC 
- glmnet
- BayesTree
- abind
- lattice
- latticeExtra
- RColorBrewer
- doMPI (to run routines on cluster computer)


// Replication file description
// `s' denotes script;
//`o' denotes other file type.

  - Masterfile.R (s): Main scripts. Runs every other script
                      and reproduces every number reported in manuscript.
  - DataProcessing.R (s): Contains various data management 
                            operations, including creation of BART 
                            objects. Returns function data.preproc().
  - GenErrorAuxFunc.R (s): Contains various functions used in the
                             process of calculating generalization 
                             errors. Returns various functions, most
                             importantly err.generator().
  
  - BartDemo.ps (o): Figure B2 in Appendix F 
  - BartIllustration.R (s): Produces plot BartDemo.ps

  - Data (d): Contains our train and test sets, as well as Nelda, Birch,
              and QED data.

  - Both.ps (o): Figure 7.
  - ExtValidationPlots.R (s): Produces plot Both.ps.
  - FullBart.RData (o): R object with single BART run, of fully-specified model
                        estimated using entire dataset (i.e. train+test)

  - FraudScoreIRT.R (s): Estimates IRT model to produce full fraud proxy and
                         alternative fraud proxy (using fewer items).
  
  - GenErrorTestSet.R (s): Estimates models for three BART specifications, and
                           produces Table 2.
  - SpecSens.R (s): Dichotomizes predictions and calculates Specificity and Sensitivity
                    values.
  - BootAnalysis.R (s): Loads results of bootstrap generalization error estimation
                        and produces Table B3 in Appendix E.
  - GenErrorBoot*.R (s): Three scripts that estimate, for each BART specification,
                         the LOO block-bootstrap generalization error. They produce 
                         .RData objects with estimation output.
  - GenErrorBoot*.job (s): Three scripts that submit bootstrap estimation jobs to LSF
                           queueing system in Pegasus cluster (University of Miami CCS).

  - kFoldAnalysis.R: Loads results of 15-fold generalization error estimation
                     and produces results reported in Appendix E.
  - GenErrorKFold*.R (s): Three scripts that estimate, for each BART specification,
                          the 15-fold generalization error. They produce 
                          .RData objects with estimation output.
  - GenErrorkFold*.R (s): Three scripts that submit kfold estimation jobs to LSF
                          queueing system in Pegasus cluster (University of Miami CCS).
  - Output (d): Contains RData objects with kFold and Bootstrap estimation results, for
                each BART specification. 
  
  - BootstrapParamSweep.R (s): Performs parameter sweep with 100 block-bootrapped samples
                               for 32 BART parameter profiles. Produces .csv file of 
                               error measures and parameter profile rankings.
  - BSSweep.job (s): Submits BootstrapParamSweep.R to LSF queueing system in Pegasus 
                     cluster (University of Miami CCS).
  - BSParamSweep.csv (o): Output from BootstrapParamSweep.R.

  - CatVariablesref.ps (o): Figure 5.
  - ContVariablesRef.ps (o): Figure 4.
  - InteractionPlot.ps (o): Figure 6.
  - ResultsVisual.R (s): Load object containing results of pdbart() runs and 
                         produce Figures 4 and 5. Depends on object produced 
                         by PartialDepPlots.R.
 - ResultsVisdual3d.R (s): Load object containing results of pd2bart() runs 
                           and produce Figure6. Depends on object produced by
                           InteractionPlots.R
  - BartInteractionPlots.job (s): Submit script that produces two-way partial dependence
                                  to LSF queueing system in Pegasus cluster (University 
                                  of Miami).
  - InteractionPlots.R (s): Run pd2bart to get 2-way partial dependency between turnout
                            and last-digit distance. Requires over 100Gb of RAM. 
                            Produces pd2bart object and stores it in Output subfolder.
  - PartialDepPlots.R (s): Run pdbart and get partial dependencies for each predictor.
                           Produces pdbart object and stores it in Output subfolder.
  - Output (d): Contains .RData objects produced by InteractionPlots.R and
                PartialDepPlots.R, which store posterior samples from the partial 
                dependence functions defined in said scripts.

  - InclProbPlots.R (s): Estimates a single, fully specified BART model with both test
                         and train sets as train sets, stores results for external 
                         validation calculations, and produces a plot of variable
                         inclusion probabilities (Figure 3)
  - InclusionProbs.ps (o): Figure 3.

  - README (o): This readme file.

/////////////////////////////////////////////////////////////////////////////////////////////////////


      

